CS562/CS662 (Natural Language Processing): Evaluating machine translation quality with BLEU
نویسنده
چکیده
The gold standard for measuring machine translation quality is the rating of candidate sentences by by experienced translators. However, automated measures are necessary for rapid iterative development. BLEU (Papineni et al. 2002) is the best-known automatic measure of translation quality. BLEU and related measures are used to automatically evaluate machine translation (MT) systems, as well as an objective for training MT systems.
منابع مشابه
Colouring Summaries BLEU
In this paper we attempt to apply the IBM algorithm, BLEU, to the output of four different summarizers in order to perform an intrinsic evaluation of their output. The objective of this experiment is to explore whether a metric, originally developed for the evaluation of machine translation output, could be used for assessing another type of output reliably. Changing the type of text to be eval...
متن کاملTowards a Hybrid Rule-based and Statistical Arabic-French Machine Translation System
Arabic is a morphologically rich and complex language, which presents significant challenges for natural language processing and machine translation. In this paper, we describe an ongoing effort to build our first Arabic-French phrase– based machine translation system using the Moses decoder among other linguistic tools. The results show an improvement in the quality of translation and a gain i...
متن کاملNatural language watermarking: Challenges in building a practical system
This paper gives an overview of the research and implementation challenges we encountered in building an endto-end natural language processing based watermarking system. With natural language watermarking, we mean embedding the watermark into a text document, using the natural language components as the carrier, in such a way that the modifications are imperceptible to the readers and the embed...
متن کاملReordering Metrics for Statistical Machine Translation
Natural languages display a great variety of different word orders, and one of the major challenges facing statistical machine translation is in modelling these differences. This thesis is motivated by a survey of 110 different language pairs drawn from the Europarl project, which shows that word order differences account for more variation in translation performance than any other factor. This...
متن کاملAutomatic Assessment of Students' Free-Text Answers Underpinned by the Combination of a BLEU-Inspired Algorithm and Latent Semantic Analysis
In previous work we have proved that the BLEU algorithm (Papineni et al. 2001), originally devised for evaluating Machine Translation systems, can be applied to assessing short essays written by students. In this paper we present a comparative evaluation between this BLEU-inspired algorithm and a system based on Latent Semantic Analysis. In addition we propose an effective combination schema fo...
متن کامل